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ABSTRACT 



An analyzing system analyzes object signals, particu- 
larly voice signals, by estimating a generation likelihood 
of an observation vector sequence being a time series of 
feature vectors with use of a Markov model having a 
plurality of states and given transition probabilities from 
state to state. A state designation section determines a 
state i at a time t stochastically using the Markov model. 
Plural predictors, each of which is composed of a neural 
network and is provided per each state of the Markov 
model, are provided for generating a predictional vec- 
tor of the feature vector x, in the state i at the time t 
based on values of the feature vectors other than the 
feature vector X/. A first calculation section calculates 
an error vector of the predictional vector to the feature 
vector xr. A second calculation section calculates a 
generation likelihood of the error vector using a prede- 
termined probability distribution of the error vector 
according to which the error vector is generated. 

5 Claims, 7 Drawing Sheets 
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VOICE ANALYZING SYSTEM USING HIDDEN 
MARKOV MODEL AND HAVING PLURAL 
NEURAL NETWORK PREDICTORS 

5 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a generating system 
using hidden Markov models for analyzing time series \o 
signals such as voice signals, and more particularly, to a 
system providing non-linear predictors comprised of 
neural networks. 

2. Description of the Related Art 
Conventionally, various systems for analyzing time 13 

series signals such as voice signals have been developed. 

As an example of such an analyzing system for time 
series signals, in FIG. 8, there is shown an apparatus for 
speech recognition using a hidden Markov model (here- 
inafter, referred to as HMM). 20 

In the system shown in FIG. 8, a speech analyzing 
section 101 transforms input voice signals into a series of 
feature vectors using a known method such as a filter 
bank, Fourier transformation, or LPC (Linear Predic- 2 $ 
tive Calculation) analysis. Each feature vector is formed 
for every predetermined time period (hereinafter re- 
ferred to as a "frame"), for instance, 10 msec. Accord- 
ingly, the input voice signals are transformed into a 
scries x of feature vectors x\ to x^wherein T is a number 30 
of frames. A section 102 is denoted a "code book" and 
has stored therein representative vector labels. 

A vector quantizing section 103 replaces respective 
feature vectors of the vector series x with the represen- 
tative vector labels estimated to be nearest thereto, 35 
respectively, with reference to the code book 102. 

The series of labels thus obtained is sent to a probabil- 
ity calculation section 106. This section 106 calculates a 
generation probability of the label series of an unknown ^ 
input speech using HMMs stored in an HMM memory 
section 105. 

These HMMs are formed in advance by an HMM 
forming section 104. In order to form HMMs, an archi- 
tecture of the HMM such as a number of states, and 45 
transition rates allowed between individual pair of 
states, are first determined. Thereafter, a plurality of 
label series obtained by pronouncing a word many times 
are learned and generation probabilities of respective 
labels generated according to the architecture of the 50 
HMM are estimated so that generation probabilities of 
respective label series become as high as possible. 

The generation probabilities calculated by the section 
106 are compared with each other in a comparison and 
judging section 107 which distinguishes a word corre- 55 
sponding to the HMM and which provides a maximum 
generation probability among the HMMs correspond- 
ing to respective words. 

The speech recognition using HMMs is effected in ^ 
the following manner. 

Assuming the label series obtained from an unknown 
input as O=0i, 02, ... , orand an arbitrary state series 
of a length T generated corresponding to a word v by 
the model X ? as s=si, S2 sr* 65 

a probability at which the label series O is generated 
from the model \ v is given by; 

[exact solution] 



Z.,(v) = 2/VW|X*) 

[approximate solution] 

<2) 

L 2 W = maxims^*)) 
s 

or in a logarithmic form, 

(3) 

L 3 (v) s =nux[log{iT0JjX')}j 
j 

wherein P(x,y|X») is a simultaneous probability den- 
sity of x and y in the model X r . 

Accordingly, the result of recognition is obtained 
using one of the equations (1) to (3), for instance the 
equation (1), as follows; 

(4) 

argmax[L\ (»)] 
v 

P(0,S|X) is calculated in the case of the equation (1) as 
described below. 

Assuming that a generation probability b/(0 ) of a 
label o and a transition probability a// from one state q/to 
another state q,(ij are integers from 1 to I) are given to 
every state q/ of the model X, the generation probability 
of the label series 0=oj, 02, . . . , or to the state series 
S=sj, S2, . . . , sr-in the model X is defined as follows; 



1=1 /=] 

wherein asosi is an initial probability of the state si and 
sr+ 1 =q/is a final state in which no labels are generated. 

Although individual input feature vectors x are trans- 
formed into labels in the above example, there is also 
proposed a method in which a probability density func- 
tion of each feature vector x in each state is given with- 
out using labels. In this case, the probability density 
b,<x) of the feature vector x is used instead of bXo) and 
the above equations (1), (2) and (3) are rewritten as 
follows; 

[exact solution] 



[approximate solution] 

CD 

L 2 ' M = nux[fW5|X")] 
s 

or, in a logarithmic form, 

(3') 

Li it) m mupog (JXXS|V»] 
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The final recognition result v of the input voice sig- advantages of both the HMM and the neural network 
nals x is given by the following equation in both meth- model. 

ods provided that the model X» (v== 1 to V) has. been Another object of the present invention is to provide 
prepared. an improved HMM recognition system using predictors 

5 composed of neural networks. 

A further object of the present invention is to provide 
a predictor which is capable of predicting a series of 
? * wimaxipytM) feature vectors or labels using a neural network model. 

In order to achieve these object, according to the 
in which X is a series of labels or a series of feature 10 present invention, it is assumed that the time variation 
vectors in accordance with the method employed. of the feature vector in the state q/ exhibits a uniform 

A typical conventional HMM used for speech recog- time tendency in the same segment (state), 
nition is shown in FIG. 9 in which q/, ay and b/(x) indi- BRIEF DESCRIPTION OF THE DRAWINGS 
cates an i-th state, a transition probability from the i-th . ^. 

state to j-th state and a probability density of the label of 15 . These and _ other objects and features .of the present 
the feature vector x invention will become clear from the following descnp- 

In this model, the state q/is considered to correspond *» » conjunction with the preferred embodi- 
to a segment i of the speech corresponding to the thereof with reference to the accompanying 

HMM. Accordingly, the probability density bj(x) in the 20 m ™™T A . f #k v , 4l% 

case that x is observed in the state q, is considered to be 2 ° ™ * » of the composition of the 

the probability density in the case that x is generated in ?P~ch recognition system according to the present 
the segment i and the probability density a» is consid- m ^5 I n jJ on ' . . . , i • 

ered to be the probability in the case that ?+ 1 at time ™> } « » ™* 
t+ 1 is again included in ihe segment i when * at time t 25 est,m * t ! n 8 P~«« of the • ccord " 

is included therein. According to the "V^ Z^ZZ^ a neural network to be 

ma two points may be identified as drawbacks in the J _ e * - 

.ug w kv/ » j ujed for a prcdlctor acc0 rdmg to the present invention, 

l^Zt^lr^ U. features in the time varia- JJ* g^St^F 8 " * 

tion of feature vectors are not suitably represented, 30 ^ ^ 

since parameters defining the function b^x) are assumed nG . $ g ^ P robabilily calculat . 

to be time-invanant, for instance, in the case where the calculating a generation probability of 

distnbutionofx^^ an unLwn input pttcm' obtained from the model 

and accordingly they are given by a matrix covanant ^ {q thc P pr ^ m mvcmjon( 

W, ^t VCragC VCCt ° rS * u - ^ a FIG. 7 is a trelis line graph for showing the concept 

(2) Since transition probabilities a* and a^are assumed fe HMM according to the present inven- 

to be constant regardless of the length of the tnbus of ^ £ 

the sute q, in the conventional HMM, although the pjQ 8 is a block diagram 0 f a conventional speech 

length t of the segment i is considered to be subject to a rccognition ap p ara tus, 

probability distribution, the length of the segment 1 is w diagram 

subject to an exponential distribution as the result ^ 

thereof and, accordingly, this distribution can not prop- nG l0 l&Bn exp i a natory view for showing vectors 

erly represent an actual state. obtained by the conventional HMM. 

In FIG. 10, a result of analysis with respect to a voice 

signal as shown in (a) thereof is shown in (c) thereof 45 DETAILED DESCRIPTION OF THE 

which is obtained by using an HMM as shown in (b) PREFERRED EMBODIMENT 

thereof. The symbols used hereinafter are first defined. 

As is apparent from comparison of (c) with (a), the For tbc ^ t 0 f brevity, states q/, q,and the like are 

resultant vectors exhibit unnatural jumps between adja- ^ ^piy denoted by suffixes i, j and the like, 

cent states. Learning of the model is explained with respect to a 

In order to solve the second problem, there have been WOK j v mainly and, accordingly, this symbol is usually 

proposed methods in which Poisson distribution and/or omitted. However, if it becomes necessary to discrirai- 

T distribution are used as a probability density function nate the target word v from another word, the suffix v 

d/(r) related to the length t of the tribus of the sute q,. 55 is added to the right of each parameter. 

However, these methods fail to completely solve the i= I, 2, . . . , I: i-th sute, 

problems of the conventional HMM method. [a,}]: transition matrix, 

In the meanwhile, it has been reported that the neural a^. transition probability from the sute i to the state j , 

network model is very effective for pattern recognition X/ r >: observation vector in t-th frame of training 

and feed forward type neural network exhibits an excel- 60 pattern r ( r=l, 2, . . . , R), 

lent property to sUtic patterns. However, it has not b/( )X^: probability density of X/ r ) in the sute i, 

been impossible to apply the neural network to non- w/^yg: weighting coefficient for connection from 

sutic signals, such as voice signals, accompanying non- f-th unit of (u-l>-th layer to g-th unit of u-th layer of 

linear expansion and contraction of the time axis. the neutral network of the sute i, 

. 65 vtf. set of weighting coefficients of the neural network 

SUMMARY OF THE INVENTION of the sute i, 

One of the objects of the present invention is to pro- XW=Xi< r >, xjW Xnr) (r) ; r-th pattern of the word 

vide a neural predictive HMM capable of enjoying the v, 



tion, 

3. 8 is a 
nition ap . 

diagram showing a conventional HMM, 

and 
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X/ r >: output vector (prediction vector of x f ) obtained 

from the neural net when a partial series of X< 6 > is (9) 

inputted thereinto, x - o^mox[Q (X, X)} 

cf r h prediction error vector, (e/ r )=x/ r )— x t l% V 

SM=siM, S2 W , . . . , S7Xr)M- series of states correspond- 
ing to XM X converges at a stationary point of P(X0), .... 

s,W: state in i-th frame of r-th training pattern corre- xW l x > «* a point which gives a maximum value or a 

spending to the word v saddle P olnt AeTeo{ b * a PP ,vin 6 the equation (9) repeat- 

m number of frames of i-th training pattern corre- l0 wWle substituting the X obtained forX«djccord. 

.... ° r ,u mgly, a local optimal solution is obtamed by repeating 

spending to the word v, ^ aboye calcu|ation umil thc ratc of changc of P(X (i) ( 

set of parameters defining weighting coetTicients _ xW|X) smal , cr than a prcdctcrmincd 

of the neural network in the state i, probability threshold value. 

distribution of prediction error vector and that of Next| a method for estimating parameters using 

the length of continuation of the state i, 15 Q(\,X') is explained. 

The following equation is obtained by transforming 

Xf-{b«£fr the equation (9). 

set of parameters of the state i, 20 0 (X, X*) *= IK*" Jr<*>|X) § —~ — x ° 0) 

X = {X,i} : set of all parameters (there may be a case in rsi 1 W r I ^ 
that a model defined by the parameter set X is 

called "model X"), *, ** r) ' **M 
P(X|X): probability at which observation vector se- 25 

lies x is generated from the model X, According to the explanation mentioned above, if X' 

q/: final state ,(=S7"+ ,W) which gives Q(X,X')>Q(X,X) is found out assuming that 

probability at which the state i is generated Q(XV) is a functjon of X ', j t becomes a renewed one of 

in the first frame (t = 1). ^ However, this is equivalent to finding X' which gives 

Learning method of HMM corresponding to word V 30 ^ inequality Q*(X,X')>Q*(X,X) assuming the following 

In this learning method, the parameter set X is esti- equation, 
mated to maximize a probability function P(X)( ! ),X(2)' t . 

. . , XW|X) defined to training patterns (r = 1 to R) ^ V) m ^ x-)/f\^\ . . . , jK*>|X) = (11) 
prepared about the word v. 

Assuming X<'> is independent from each other, the 35 f & I KW, &\\) log &\\*) 

following equation is obtained. r-l ^ 

(7) wherein cW=*l/P(X<')|X). 
.... **>\k) m n i\xM\\) 40 Thc e^on (1 1) is further transformed as follows; 



R 

s x ... x n i\x< k \st k )\k) 



45 



Herein, an auxiliary function Q (X,X') is defined as If the HMM now considered is of a left to right type 
follows; 50 and* accordingly, the state never returns to a state hav- 

ing been left therefrom, the following two equations 
(13) and (14) are derived from the equation (12) assum- 
~1 * ing that t,(s< r )) denotes a starting time of the state i in the 

n /XJr<*>, s<*)|X) x stale series t/(SW) and TKs<'>)=t/s< f ))-tXs<')) denotes a 

k = 1 J 55 hold time of the state i (the state j is assumed to be a next 

state following the sute i, as shown in FIG. 2). 



log jj «**>. S«|X, jspy^ .g^, ^ jM W _L 

60 



03) 



Referring to equations (7) and (8), the following rela- 1 
tion is obtained. X log m + log dAH^y + 1 

IfQ(X,X')^Q(X > X) > thcn L ,= 1 

P(XO, . . . , X<*)|X')^P(X0>, . . . , X<*>| X), and when 65 
X'=X, both become equal. . , t ^ , x , 7^+1 . .... 

Accordingly, if it is possible to obtain the following 108,015 ,J M **"« L * > + 

equation; 
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-continued 

r 



(18) 



J r 



k- 1 



= § CO*!? 1 '! 1 



/«2 r-l/-l(^j) 



(M) 



Accordingly, the equation (16) can be calculated by 
calculating a/'Xj) from t = 1 to W+ 1 and j = 1 to I in 
turn according to the equation (17) after setting aj<') 
(1)= 1 and giving suitable initial values to the parameter 
10 set X and by calculating /3/ r > (i) from t=T<')+ 1 to 1 and 
from i= I to 1 in turn according to the equation (18) 
after setting 



In the equation (14), P/ r >(ij,T) is defined as follows, 
provided that \\ indicates an initial state in the tribus of 15 
the state i. 



is a simultaneous probability density of X< r \ 



20 



Estimation of transition probability a# 

A Lagrange multiplier method is used therefor. 

Using the equation (13) and the following equation; 



25 



the following equation is obtained 



7 



] 



and s/')=ji in the model X and, in other words, denotes 
a total sum of generation probability densities of paths 
passing through the hatched areas shown in FIG. 2. 30 
Using this relation, it becomes possible to calculate the 
total sum related to the path S< r > in the equations (13) 
and/or (14) at two divided stages. From this equation, 0 is obtained by multiplying a# 

Namely, the total sum is first calculated with respect on both sides of the equation and, thereafter, taking a 
to paths passing through the hatched areas in the entire 35 sum from j = I to 1 as follows; 
range of t, r and j and, then, the total sum is calculated 

over the entire range of t, t and j obtained as results. ^ ^rj+i ,_j / 

Transformation from the first equation to the second 
one in the equation (1 3) or (14) is based on this consider- 



r-1 r-2 r-ly«l(?fc/) 



ation. 
Said 



can be obtained as follows. 
Namely, defining the following equations 

e?> (0 i *P|jJ*-fc M 

it is represented as follows: 



40 



45 



Accordingly, the reestimation value of a,y is given by 
the following equation. 



j.*'2:'3>* 



(19) 



(15) 



50 Estimation of probability density of length of the tribus 
of the state 

In this process, parameters defining the probability 
density of the length of the tribus of the state i are esti- 
mated. 



55 



(16) 



For example, if ay—y/ (constant), then 



aVl^VXr^v n 6X* ( f i f _, + *W}' ) vO From the equation (14) and by setting {i'=r/, the 
** 1 60 following equation is obtained 



Then, the following recurrence formulae are intro- 
duced. 



(17) 



65 



0. f **3* lf X l | . i^O>X 
r=l i«2 t= 1^=1(^0 

-^jrlogrfXr)' 



/«=2 r= 1 j— 1(^0 
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-continued predicative error is given by a normal distribution, b,<x,) 

is given as follows. 



M 



and accordingly, the reestiroation value of yi is given by b ^ _ [ ! j 1 2 - 1 1 1/2 x 
the following equation. L 2,T J 

I «» 'i' i rffc jfr - .) (20> 10 exp [- -j- U, - ^^-'{x, - *<0> ]. 

tu&b* By taking the logarithm of the above equation, the 

following is obtained. 

If d/(t) obeys another distribution, for instance, a 15 
Poisson distribution as follows; r <* 2 > 



20 y log| Zf 1 1 " y <*' - ffKO) 7 ^," 1 ^ " 

the following equation is obtained in a manner similar to 

the above. Assuming that Wjis a set of weighting coefficients of 

the neural network in the state i, h/(i)=h(x,^i t X/. C 2, • • • 
r 7<r) + j / » xt-eufWi) is an output of the neural network to inputs 

0 = 2 c** 2 2 2 p^C* » x 25 X/ .ci, x /<2 . . • • » XwAf which is a function characterized by 

, r " /~ w w . | and ^ /is a vector wh i ch i s held constant in the state 

ay,' lo *^ r) ' i, giO) can be represented by an equation gK0=jx/-|-h- 

^ 7<f) + i ,_i / r "| A reestimation equation of u-/ can be introduced in a 

= 2 2 2 2 ffifl*J>x\^7 — 1 manner similar to the introduction of the average vector 
r =i 1=2 T _i;_i(*i) L 30 in the parameter estimation of the conventional HMM. 

. . •« .„ . Ho wever.it is also possible to include \xi in W f consider- 

Further, y/is thereby given by the following equation. ^ ft a$ & bias mpm tQ ^ fma] , aycr of the neura] 

network. 

* ,.7<'Ui /—I / w < 2,) In this preferred embodiment, the latter case is ex- 

r l x * n ,| 2 t ; u= i^q^ UJ)t 35 plained. Further, 2 / =[mn]a l ,„^]=[or''')- , is a vari- 

>' * ^ j<r) +1 JTi 7 ance-co variance matrix of x t — gj(i). Accordingly, pa- 

r £j ^ t»i -UM rameters to be estimated are the variance-covariance 

r « r«= T = /- / matrix and weighting coefficients W,. 

FIG. 3 shows an example of the parallel type neural 
Estimation of parameters related to b f <x) 40 network of employed in the present invention. The 

In the conventional HMM, b,<x) is usually defined as small circles denote so-called units. Each arrow line 
a probability density of the feature vector x in the state connecting two units denote a flow direction of each 
i. Namely, if this probability density is assumed to obey signal and an output of a unit connected to the rear end 
a normal distribution, it is represented by 45 of an arrow line is inputted to a unit connected to the 

front end of the arrow line after multiplying by a 
*X*,)=N(jc,; Hf. 2/) weighting coefficient given to the line individually. An 

alignment of units in a transverse direction is called 
wherein w is an average vector and s, is a variance- "layer". In this preferred embodiment, a neural network 
covariance matrix. Since ji/is invariant when the state i comprised of three layers is used. The first layer corn- 
continues, a value of a product of probability densities prised of units into which inputs to the neural network 
such as b/(x,-i) b/(x f ) bi(x f +i) is irrelevant to the order m directly applied is called an "input layer". The final 
of generation of x,. Accordingly, representation of dy- . layer comprised of units from which respective outputs 
namic features of a time series is insufficient in the con- of ^ neura i network are obtained is called an "output 
ventional model. layer". Other layers are called "hidden layers." Ac- 

In contrast, ft/is treated as time variant by accepting co^giy, the neural network of this embodiment is 
influence due to the past series of feature vectors in the compriscd of the f irstt ^cond and third layers which 
present invention. Namely a predictive value to a fea- mflke ^ . m6tn an(J output , aycrs respec , 
ture vector x, at a time t of input voice signals in pre- tf A ^ $um of { ^ „ 

dieted by a non-linear predictor comprised of a neural ^ fa ^ fe funclion which [$ ^ 

network defined in every state and u,is replaced by the « P „ s{ * 

predicative value as follows; piG. 4 shows an example of a sigmoid function. 

ttti-Wvflft I/)=M*,-*K0; 0. X/) Usually, the input-output characteristic of each unit 

belonging to the input layer is given by a linear function 
Thus, bXx,) is a probability density of the predicative 65 and, therefore, it transmits the input itself. Further, the 
crror input-output characteristic of each unit belonging to the 

For instance, assuming that x, is predicted from a time output layer is usually given by a sigmoid function or a 
series x^, x,*, . . . , x,* (ci<c 2 <. . . <c M ) and the linear function. In this preferred embodiment, the input- 
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output characteristic of the final layer is given by a 

sigmoid function as in the case of the hidden layer (25) 

Further, in the preferred embodiment, it is assumed * log 6X* ( ,'i r . a log M^/It-i+a)' H%,r,t** h 
that there is one unit in the intermediate layer which has 7! * f , Hd «+i ) a » \ . x 

a unit input and a linear input-output characteristic and J * • * 

outputs a bias input to each unit of the final layer. a 0 <#) » . 

FIG. 5 shows an example of the composition of the — ^ J ' T ' g 
neural network predictor for predicting x/ r ) from 



assuming the k-th component of x/ r ) / T J •) 

15 



Next, the estimation method for these parameters is 
explained. 2Q O-XEV/^ 

(a) Estimation of weighting coefficients 

Assuming a weighting function of connection from ^ I a log feX*/_ r _ 1+A )' u w4| 

the f-th unit of the (u-l)-th layer to the g-th unit of the * h i (%. r JT V * ' ' * 

u-th layer in the state i is represented by Wi*' 1 /"^ a cor- 
rection amount of the weighting coefficient in the learn- 25 <l - 1°%i.t, kt'^ n 
ing iteration is given as follows. 

In the final layer U, since 

Assuming that provided that the output of the g-th unit corresponds to 

the g-th component of prediction error vector 
u r) v e,=x,-x„ the following equation is obtained. 

denotes a total sum of inputs to the g-th unit of the u-th & log 6K* ( ,l T -]+Jt)' 3 [" | d d mn ^ 
layer in the state i wherein J , T / g - * Tfij~J^' L " T « - 1 « - 1 X 

and s/ r )=ji continues k-times in the r-th learning pat- ' 4 ' J 

tern. 

45 •rtWfL 2 — 

denotes outputs of the g-th unit of the u-th layer in the 

state i under conditions similar to the mentioned above * r <') v » v \ 

and f(.) denotes a sigmoid function in each unit, the -> 
partial differential of the right side of the equation (23) J0 

is transformed as follows. * " = | ^ AC%rf m') * Ai%.r.k U t ') 



<> -/rf^r AW/2 



*3^-A* A A/..W*"* 

r * log 6^1,-1,*)' j ^ y 

k ^fW+n-i y « 60 {i-(«»%.,Art/2 

r-1 /-2 r-l/-l(^/) 

Accordingly, the equations (24), (25) and (26) rnay be 
r a log ^Mlr-i+i)' rewritten by setting the following equation. 

k l , — ~7. — ° ( W** V 

If u^U(fmal layer), the following equation is intro- QtUr kt > ' 

duced. 
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-continued ^ w 1 ^ : Providin E initial values ° r w ff' l t wi,h respect 

= ,1, » V ^i^M - (W2-): Calculating 

5 

J x<SU*7 «VrV i> = s ia«w-»fto. xi 



*Wl = J * W *' ""l "V' * <» " l'%.ri,W m (W3 ' ): ExeCUtil, 8 Ste P S ( W4 > aX,d < w5 > W ' ,h '"I* 01 10 

* jq r= 1 to R. 

a run < w4 ') : Calculating 



Thus, the point which gives a local maximum value 



with respect to the weighting coefficient w,« of the 15 with respect t0 all possible values of j, t, r, k, g and u 

state i given by Q*(X,X') is estimated according to the according t0 the equations (26^ and (25'). With respect 

following steps in the iteration of parameter estimation. %Q h fe ^^ted ^ the revcrse order of u=U, U- 1, 

Step(wl): Providing initial values of w^y with j 

°' P 05 ^* Va o^°^ U,f 8 * 20 " (w5 ; ): Calculating Aw/- W with respect to all possi- 

Step(w2): ^iculanng L,«Q*<WO ble V valu es of u> f J d g acc £, ing t0 the equation5 

Step(w3): calculating and (H) anrf replacing (w^iyy+Aw^/V) with w '*' 

lf*g ' with respect to all possible values of u, f and g. 
*SW« (w6 f ): Calculating 

25 

with respect to all possible values of r, j, t, t, k, g and u 

according to the equations (26') and (25). With respect Li = l H#*\*)- l <Mk 

to u, it is calculated in the order of u=U, U-l f . . . , 1. 

Step(w4): Calculating Aw^/,' with respect to all (w7 ' ): Completing the estimation process (of one 
possible values of u, f and g 30 time) for ^ weightin coefficient in the state i by set- 

Step(w5): Replacing vf-lfi +^>/ s with w,*" tf , . , . f ^ ^ Qf im rovemem of 

L2 to Li becomes smaller than the predetermined 
Step(w6 : Calculating Lj-<«W ) threshold value. If said ratio does not become smaller 

btep(w7): Completing the ! estimation . process lot one , threshold 

process returns to 

time) for the weighting coefficient m the state 1 by set- 35 . . ' 

ting wfW to w**y f .if the ratio of improvement of SM *<** . aft ? sen ?« L ? Ll * 

L 2 to L, becomes smaller than a predetermined thresh- « Estimat "> n of Vara T C< ^ 

old value. If said ratio is larger than the threshold value, Not,c A m S A a ^ ^ccordmgly 

the process returns to step(w3) after setting L 2 to L,. A /Wfl = A ?m and A/-- A|« which are defined as co- 

The algorithm mentioned above is employed for 40 factors of the former, using 
searching a peak of a target function in a space consti- 
tuted by X with use of the steepest descent method as , d , d 
one of iteration met nods. m-\ m=i 

Namely, in each stage of the iteration calculation, X is 
renewed in such a direction that the value of the target 45 A inm *T 

function P(X<1), . . . , X<*)|X) becomes larger than the 55 "f^f 1 <r,mfl " TlfT 

most previous one and, accordingly, renewal of X to the 
vector series of R vectors is made by one time. . 

Contrary to the method, there is known a method Slnce 
called "probability approximation method" wherein a 50 

peak higher than a local peak may be obtained. In the a of"" d A imB ' A im „' 2 

algorithm, the probability direction is used in which 7^7 = e aimn > = - Tj/jT = ~ (<r ?""> 2 

P(XW j X) increases with respect to each X< f ). Namely, X 

is renewed R times with respect to the vector series of . . ( ,. # ^ 

R vectors. In this case, X is not always renewed in such 55 81 r , ' = 1 ogl * — L , ' t = -<r /mB ' (o-f^o 2 

a direction at each stage of R renewals in that the target 3 aima d Um * 

function P(X0), . . . , X(")|X) increases but, when aver* 

aged over R renewals, X is renewed in the above direc- the following relation is obtained. 

tion * 60 
The estimation method in this case is made according a 1 a , 

to the following steps. TvJ^ tog = 2 T^7 l °^'^- 

An auxiliary function to x( f ) is introduced by the fol- 
lowing equation , ^ err' 
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•continued -continued 

Accordingly, the following equation is obtained. 5 (4) cstimalion values of a .. with rcspect t0 

i=l to M and j = 2 to I according to the equation (19). 
0 .JiML x2 . Namely, 



15 



Thus, the reestimation formula of X/ is obtained as X< * d^'S 7> K X,-> 

follows; 



with respect to i = l to 1-1. 
Estimation of parameters related to probability distri- 
X c^I numir) (27) 20 bution of tribus length of state 

' =1 l ' num (6) Executing step(7) with respect to r = 1 to R. 



i t &lummW (7) Calculating 

wherein * P ( fl(v> 

X^W- (28) 25 

*X +1 'x* X /><;></./) x * ( rir~i+A an <* cW with respect to t=2 to T<')+ 1, t= 1 to t-1, i= 1 

t=i t*1;«h^/) ' t0 j.| ^ j_2 l0 i according to the equations (16) to 

ind (18) assuming X = {X,}. 

7<r) +1 ,_j 7 (29) 3Q Further calculating y/,««m(r) and ytfenomfr) accord- 

Xi,rfr»om (') = ^ r £ J; . = X^pV ) t(^> ing to the following equation 

The actual calculation procedure for the parameter 7<'>+i /-i / M 

estimation is follows. " rl 2 r£ i^tV'' 0 '*' " ° 

Upon forming a mode X v corresponding to the word 35 
V, it is assumed that the pattern , , 7«+i r-l / 

* 0 " *1 } (8) Calculating estimation formula of y/ with respect 

40 to i = l to I according to the equation (20) as follows. 

is given as training patterns beforehand wherein r = 1 to 

R, x/') t-th feature vector of the pattern r and W is a R 
number of frames of the pattern r. y , „ i &n„M/ 1 (^WwmC) 

It is further assumed that j > i, I = f, i = 1 to M , j = 2 to r=1 r_1 

1 M^LltU value!' 45 < 9 > ^placing y, with y/ in the equation 



(1) Providing suitable initial values to 



X/ - 7h X,) 



50 with respect to i = l to I — 1. 

with respect to i - 1 to I. fST* 0 " ° f ™* h ?P* C °f^! Cnt . S . t , • 

Estimation of transition probability JK)) Executing steps (1 1) and (12) with respect to r= 1 



(2) Executing step (3) with respect to r= 1 to R. Calculating 

(3) Calculating 55 UU calculating 



with respect to «-2 to W+l. r-1 to t-1, i-1 to 1-1 „ "\ C ^l-^ 

and J-X to , according to the equations (16) to (18) « of thc ncu . 

assuming A —\Ai). 4 ral network is made by executing steps (wl) to (w8) or 

Further calculating a^r) and a^WO accord- (w}t) %Q (wT) ^ fcspcct to i * 1 to I — 1. The estima- 
mg to the following equation. (jon yaJuc of thc weighting coefficient from the m-th 

65 unit of the (u— 1)— th layer to the u-th unit of the n-th 
7<')+i ,_| unit is defined as w^> m V 

•«wnW - f * 2 T I,;R(w) (13) Replacing w^» ffl % (for all u t m and n) in the 

equation 
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*/ = fa^J, *n I/) 

with respect to i = l to 1-1. 
Estimation of Variance-co variance matrix. 

(14) Executing step(15) with respect to r=l to R. 

(15) Calculating 



10 



and cW with respect to t =2 to TW+ 1, t = 1 to t - 1, i = 1 
to I — 1 and j=2 to I according to the equations (16) to 
( 1 8) assuming X = {X,}. j 5 

Further calculating the following equation (of the 
equations (28) and (29)). 

14*. (D = ^ 'I) jjfe/tr tt» X r*l r _, +Jk * t <»- l+k T 20 
f =>2 r= 1 y=>|(^ti) 



In this preferred embodiment, a,y=l for j=i + 1 and 
a//=0 for i+1 are assumed. 

Firstly, the functions of respective sections in the 
system enumerated below are explained. 

801 ... Feature extraction section which transforms 
voice signals of respective training words r = 1 to R 
into a series of feature vectors 

802 .. . Word pattern memory section which stores a 
plurality of the feature vector series (word pattern) 
corresponding to the training words for forming a 
model XQn this embodiment the number of them is 
R). 

803 . . . Buffer memory which temporarily stores one 
of the word patterns stored in the word pattern 
memory 802. 

804 . . . Partial probability calculation section which 
calculates 



(16) Calculating the estimation formula of 2/ with 
respect to i = 1 to 1 according to the equation (27). 

Si- X &*UmmW * c« ^■l.denom M 
r=\ r= 1 

(17) Replacing 2, with 2/ in the equation 

with respect to i= 1 to I- 1. 

Evaluation of degree of improvement of parameters 

(18) Calculating 

Z.j = J. <lqg/(jrM|X)}7W 

assuming X={X,}. 

(19) lfLi-L2|/L)>6, the process returns to step (2) 
after setting Li=L,2. Otherwise, the process is com* 
pleted. 

In step(19), 5 is a small positive number suitable for 
determining the width of convergence. If 6 is too small, 
inconveniences such as an excessive convergence time, 
overlearning and the like are caused, although the accu- 
racy in the parameter estimation is enhanced. The over- 
learning is caused for the following reason. Though 
each parameter can be optimized more and more to 
learning samples by repeating the learning iteration, this 
is true only for the learning sample, and parameters are 
not always optimized to samples other than the learning 
samples. Of course, in the case where there are enough 
learning samples and the characteristic of the popula- 
tion is well reflected in them, the overlearning does not 
occur. If 8 is set at a relatively large value, earlier con- 
vergence is obtained but the accuracy is lowered. Ac- 
cordingly, if the value of 6 is set at a relatively large 
value, earlier convergence is obtained but the accuracy 
is lowered. Accordingly, the value of d is to be set at a 
suitable value for the actual situation to be trained. 

FIG. 1 shows a block diagram of the HMM generat- 
ing system according to the present invention. 



25 



30 



35 



40 



45 



50 



55 



60 



65 



P { 8r«J> 

and cW prior to the estimation of parameters in the state 

i. 

805 .. . Calculation section of the expected value of 
the length of the tribus which calculates expected 
values of lengths of tribuses t and t— 1 in the state 
i regarding paths. These are the denominator and 
numerator of the equation (20) for the transition 
probability. The expected value of r obtained also 
becomes the denominator of the equation (28) 
which gives the variance-covariance matrix of 
error signals. 

811 ... Calculation section of error variance-covari- 
ance matrix numerator which calculates the equa- 
tion (29). 

813 . First parameter (weighting coefficient) calcu- 
lation section which executes the calculation from 
step(wl) to step(w9) to obtain the estimation value 
w,*- 1 /^ of the weighting coefficient. 
First cumulative sum calculation section which cal- 
culates a cumulative sum of values calculated re- 
garding each training word by the tribus length 
calculation section 805. 

812 . second simulative sum calculation section 
which calculates a cumulative sum of values calcu- 
lated regarding each training word by the error 
variance-covariance matrix numerator calculation 
section 811. 

807 .. . second parameter calculation section which 
calculates a ratio of the numerator and denomina- 
tor of each parameter calculated by the first and 
second cumulative sum calculation section 806 and 
812, respectively, and thereby, obtains estimation 
values of the transition probability y/ of the state i 
and variance-covariance matrix 2,of the estimation 
error. 

808 . . . Parameter memory for storing estimated 
parameters. 

809 .. . Total probability calculation section which 
calculates a total sum of the probabilities P(X^ | X) 
for all r based on estimation values of parameters 
stored in the parameter memory section 809. 

810 , . . Total probability memory for storing the total 
probability calculated by the section 809. 
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816 . Control section which executes a setting of setting signals from the control section 816 and the 
various quantities for respective sections and vari- partial probability calculation section 804 calculates 
ous commands in relation to respective sections. 



In this system, the formation of the model \ f corre- 
sponding to the word v is made as follows. 5 

Voice signals obtained by pronouncing the word v R 
times are transformed into R patterns as feature vector and c<* with respect to the corresponding frame of the 
series by the feature extraction section 801 and, then, R word pattern X<') stored in the buffer memory 803. The 
patterns are stored in the word pattern memory 802. By calculation section of the expected value of the tribus 
a read command for the learning word from the control to length 805 calculates the denominator of the variance- 
section 816, the r-th word pattern X<')(r«= 1, to R) is covariance of estimation error based on the values ob- 
read out from the word pattern memory 802 and is tained and the numerator thereof is calculated by the 
stored in the butter memory 803 temporarily. Then, t calculation section 811. At that time, as parameters 
and t obtained in the parameter calculation are supplied included in the denominator and numerator, values 
to the butter memory 803 as frame setting signals from is X/«{X,-, W,-, 2,} stored as parameters in the state i in the 
the control section 8 16 and the partial probability calcu- parameter memory 808 are used. The first cumulative 
lation section 804 calculates sum calculation section 806 calculates the cumulative 

sum of the denominators regarding all training word 
p ( f ) (y) patterns X< r ) (r= I to R) and the second one 812 calcu- 
,,r 20 lates that of the numerators regarding for them. The 
and 4* with respect to the corresponding frame of the P«™"« calculation section 807 calculates new esti- 
word pattern stored in the butter memory 803. The mat,on vaJu f s 1 f f lhc vananc«ovanance of the esttma- 
calculaTion section of the expected value of the tribus Uo « moT ?f Restate i by taking a ratio of the cumula- 
length 805 calculates the denominator and numerator of „ * um of ? c Nominators to Um of the numerators 
y/based on the values obtained. The values A,={*/, W, f 25 * hlch * avc been calculated with respect to the state i of 
X,} stored as parameters in the state i in the parameter training word pattern XW <r= 1 to R). This is re- 
memory 808 are used for parameters included in the J***' 1 for 1 fro . m 1 to L The parameter memory 808 
denominator and numerator upon the above calcula- stores the new estimation values of the vanance-covan- 
t j on ance of the estimation error as renewed parameters 

The first and second cumulative sum calculation sec- 30 
tions 806 and 812 calculate respective cumulative sums x „ { X(J } 

of the denominators and numerators with respect to the 

training word patterns X«(r« 1 to R), respectively. corresponding to the word v in place of old ones. 
The parameter calculation section 807 calculates a new 3J Thc tota] prob abUity calculation section 809 calcu- 
estimation value of the transition probability y/ of the Utc$ thc tou| probabiHty for all trainmg wor ds and 
state i by taking a ratio of the cumulative sum of the ^ it ^ lhc tota i probab ility having been cal- 

denormnators to that of the numerators which have culat ^ ^ usc of btcst paranictcrs a„ d stor ed in the 
been calculated with respect to the state . .of tht training mcmQ g , 0 ^ h f ^ ^ b M %Q thc 
word pattern X(r) (r- 1 to R). This " effected for all » contr / scction 816 wd thc ncw totaI probabiUty ob- 
from 1 to I. The Parameter memory 808 stores the new b ^ {n ^ m gl0 ^ ^ ntrol 

estimation values of the transition probability as re- g|fi ^ p „ mcXeT ^ mAtk)n M «ion if an 

newed parameters improvement effect is recognized based on the result of 

the comparison. If the result indicates that the differ- 
* - {x, ? } 45 ence between the new one and the old one is still larger 

than the predetermined threshold value, the estimation 
corresponding to the word v in place of the old ones. of the weighting coefficients W, is executed as follows. 
The total probability calculation section 809 calculates Responsive to a read command for the learning word 
the total probability for all training words using re- from the control section 816, the r-th word pattern 
newed parameters X in the manner mentioned above 50 X( r )(r= 1 to R) is read out from the word pattern mem- 
and compares it with the total probability stored in the ory 802 and is stored in the butter memory 803 tempo- 
total probability memory 810. The result of the compar- rarily. Then, t and r obtained to the buffer memory 803 
ison is sent to the control section 816 and the new total as frame setting signals from the control section 816 and 
probability obtained is stored in the memory 810. The the partial probability calculation section 804 calculates 
control section 816 suspends the parameter estimation 55 
calculation if an improvement effect is recognized from 
the result of the comparison, namely if the difference 
between the new one and the old one becomes smaller 
than the predetermined threshold value. If it is larger 
than the threshold value, the estimation of variance- 60 and c<'> with respect to the corresponding frame of the 
covariance of the estimation error is executed according word pattern XW memorized in the buffer memory 803. 
to the equation(28). The parameter calculation section 813 calculates the 

Namely, by a read command for the learning word weighting coefficients based on the values obtained, 
from the control section 816, the r-th word pattern The values \i={y,\ W„ I/} stored as parameters in the 
XW(r= 1 to R) is read out from the word pattern mem- 65 state i in the parameter memory 808 are used for param- 
ory 802 and is stored in the buffer memory 803 tempo- eters included in the parameter calculation upon the 
rarily. Then, t and t obtained in the parameter calcula- above calculation. This is repeated with respect to i = 1 
tion are supplied to the buffer memory 803 as frame to I. 
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The parameter memory 808 stores the new estimation 
values of the prediction coefficients as renewed parame- 
ters 



22 



However, the calculation volume can be reduced by 
utilizing past calculation values. Here, the following 
quantity is defined for later discussion. 



X = (V) 

corresponding to the word v in place of the old ones. 
The total probability calculation section 809 calculates 
the total probability for all training words using re- iq 
newed parameters X in the manner mentioned above 
and compares it with the total probability stored in the 
total probability memory 810. The result of the compar- 
ison is sent to the control section 816 and the new total 
probability obtained is stored in the total probability 15 
memory 810. The control section 816 suspends the pa- 
rameter estimation calculation if an improvement effect 
is recognized from the result of the comparison, namely 
if the difference between the new one and the old one 
become smaller than the predetermined threshold 20 
value. If it is larger than the threshold value, the calcu- 
lation of the transition probability is executed again. 

By repeating these calculations until the difference 
becomes smaller than the predetermined threshold 
value, the parameters 23 



(31) 



*(Ut)« n m*/-t-i+a) 



From this, the following relations are obtained. 



*(U3) = B(U2Mxt-]) 



£(Ur) « BiUr - 1)M*,- T ) 



(32) 



Also, d|<T) is stored in a table in advance by calculat- 
ing it for t = 1 to T. In this case, the equation (31) can be 
calculated in the following manner after setting B(i,t- 
,o)=l. 

Calculating the following equatton(33) for with re- 
spect to t= 1 to t— 1. 



(33) 



converge to respective constant values. These parame- 30 
ters thus converged are those to be sought. 
In other words, the model is thus obtained. 
Method and Apparatus for speech Recognition 
Next, the method and apparatus for recognizing ac- 
tual input speech using the model mentioned above is 35 
explained. 

A so-called exact solution is obtained as a result of 
recognition v which gives the maximum value of 
P(XIM") calculating PQCIM*) for v from 1 to V when 
unknown input pattern X is inputted. This is obtained 40 
by replacing X< r >with X and M with M v ih the process 
for calculating the probability cW«P(XW|M) of the 
model M corresponding to the input pattern XW upon 
forming the model mentioned above. 

Hereinafter, a method for obtaining an approximate 45 
solution corresponding to the equation(2') is explained. 

Assuming <{>(i,t) as the maximum cumulative proba- 
bility at the state i at time t, the following recurrence 
formula is obtained corresponding to the equation(2'). ^ 

(30) 

«/+!,*- max (Mlf-r)/^] 

wherein - ^ 



60 

Accordingly, by calculating <f>(i,t) for i=l to M and 
for t=2 to T+l in turn, <J>(I,T+1) becomes the maxi- 
mum probability of the model X (model M) to the input 
pattern X. 

Since it becomes necessary to calculate the equa- 65 
tion(30) for all combinations of r=2 to t- 1 and i= 1 to 
I with respect to every frame t(= 1 to T). the volume of 
the calculations becomes too large. 



(2) *(■ + 1,0 - maxbj(r)l 

7 

FIG. 6 is a block diagram of a preferred embodiment 
according to the principle mentioned above. 

Functions of the respective sections enumerated 
below are as follows. 

901 ... Feature extraction section which transforms 
input voice signals into a series of feature vectors 
xi, X2, . . . , XF 

902 . Buffer memory for temporarily storing the 
series of feature vectors. 

903 . . . Calculation section for the generation proba- 
bility of the vector frame which calculates the 
probability density bi T (x,_i) of a prediction error 
Xf_i in a frame t— 1. 

904 .. . Parameter memory which stores parameters 
of the probability density function necessary for 
calculation of the probability density. Namely, it 
stores yu W, and 2/ in the state i=l to I-l. 

905 .. , Cumulative probability density calculation 
section which calculates B(i,t,T) according to the 
equation(32). 

907 . . . Calculation section of the probability density 
of state holding time which calculates the probabil- 
ity density d/(r) for t=1 to T at which the tribus 
length of the state i becomes t using the parameters 
stored in the parameter memory 904. The probabil- 
ity density d,<r) calculated is stored in a memory 
910. 

906 . . . Cumulative probability density memory 
which stores results calculated by the calculation 
section 905 in turn. Thereby, the calculation of the 
equation(33) is made by reading out the contents 
stored in the calculation section 905 in a recurrent 
manner. 

908 .. . Recurrence formula calculation section which 
reads out the contents of the memory 910 for d<r) 
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calculated and calculates the equation(31) for i=l 
to I and t=l to T+l to obtain <j>(I + l.T+l) by 

executing the above stcps(l) and (2) using the con- (2) o(i 4 1.0 * ma*l# (r)] 

tents read out together with the outputs from the r 
cumulative probability density calculation section 5 

910 In ti" s cftsc* calculations made regarding the equa- 

909 . . Memory for the intermediate cumulative *** OJfc (32) and (33] > are altered to those of the equa- 

probability density which stores each intermediate «™ 00* ™* (33^ »n the " 

cumuUtive probity d^^culate^by 10 ^ ^ * c 

the recurrence formula calculation section accord. q( ^^Xio* can be reduced a great deal even 

mg to the equation^!) in turn. The stored interme- ^ h ^ $amc resul| ^ obtained, 

diate cumulative probabihty density used for calcu- Various methods for the formation of the model can 

lating the remaining recurrence formula in the ^ considered other than the above method according 

recurrence formula calculating section 908. 15 t0 thc stcps (j) t0 (19). For instance, a method can be 

911.,. Frame set signal generating section which sets employed in which paths giving the maximum probabil- 

the frame number t, state number i and the length r ity for respective r are sought according to the recur- 

of the tribus of the state q,* in turn. These values of rence formula (30) or (30*) and parameters of b/x) and 

i, t and r are supplied to respective sections for those of dfr) are calculated from the feature vector 

executing processes mentioned above. 20 series corresponding to a portion of the state i of the 

The final result <KI ( T+ 1) obtained gives the proba- path obtained, 

bility density of generation of the vector series x\, x 2 , . Namely, with respect to d*r), the frame number Ur) 

jj according to the model \. of P atn corresponding to the state i is sought every 

rSrSitrSt- « v., „ v « - •» S >'<•-*> . *»-*-K>-rt 

and each model X v is prepared for each word v. When it 
is assumed that <f>'(I, T+l) indicates <f>(I, T+l) ob- 

tained for the model A* according to the processes men- 30 i/<i - 7i ) « j, h,)/r 
tioned above, the result of recognition is obtained as 



follows. 



-l 

7, is given as follows, 



<41) 35 _ i 

The calculation of the recurrence formula (30) can be The cst i mat jon of parameters W/and 2/of b^x) can be 

simplified by taking the logarithm of both sides thereof ^ effected in the following manner, 

in which the equations (30), (32) and (33) are trans- Assuming that 
formed to equations (30'), (32') and (33') as follows. 



(30) 

45 



<K»+ = nu [<t>(i:f - r) + !og//,T<0) correspond to the state o wherein r< f > indicates the 

r£ '~ 1 frame number corresponding to the state q/of 

r(Ur) - r<ur - i) + e{i.x ( . r ) or) 

*<r) - - r) + HUT) + Attr) (33) ^ ^-J-l+A 

wherein the following relations are defined. » * ^acher signal W, is estimated by an error -tack- 

propagation method in such a manner the following 



«KU)-k>g*<u) quantity is rmnimized. 



A«r)«logdXr) 55 ~ ^ 

I 2 



*Ut)«A* #Ur) 

♦<Ui)«/of *X*i) AJso ' thc e*** 1113 *" 00 valuc of 2/ can be obtained from 

& the following equation. 

(1) Further, the above-mentioned steps (1) and (2) 

become as follows. R # i _ 

(1) Executing the following equations for t= 1 to t-1. 2 ' ~ r ± } k ±i r x x 

nUr)=r(ir.T-l) + ©(a,_ r ) 65 <* ( 2L'Ll** - + 

//(r)«<KU~r)+ru/.r)+AUr) In this case, frames of the state i are obtained by 

storing BB(i + l,t)=t-rtp f forts] toW-fl andi=l 
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to 1 wherein r cp t denotes an optimal value of t to Tand i 
in the calculation of the recurrence formula (30 ) or 
(30'). The starting frame t/ of the state i in the optimal 
path corresponding to r is given as follows. 

f/ = 7<'> + 1, f/-i = BBbiU = BB{u-\\ . . . 
#2 BB[t}), l| « BB(t2) - 1 

Although the transition probability in the same state 
is explained as invariant irrespective of the staying time 
thereof in the above preferred embodiment, it is unnec- 
essary in this case to consider dj(r) and, accordingly, r, 
namely, loops of r become unnecessary. Therefore, the 
amount of calculation can be reduced. However, the 
above description regarding dXr) is intended to be ap- 
plied for the case in which dXr) is given by a distribu- 
tion in a general form such as normal distribution or 
Poisson distribution. Namely, parameters can be esti- 
mated in the above case according to the same method jq 
as mentioned above. 

The recognition experiment for Japanese digit utter- 
ance using the present invention was executed. 

Experimental conditions; 

* samples 

For . training: 51 speakers (36 males and 15 females) 
For recognizing: 52 speakers (39 males and 13 females 
excepting training speakers), 520 utterances 

* Number of states: 7 with loop and 1 for termination 

* Number of layers of neural predictor: 3 

* Number of hidden units of neural predictor: 10 
The recognition rate by the HMM according to the 

present invention was 99.4%. This recognition rate is 
the highest among recognition rates having been ever 
previously obtained. 

It is understood that various other modifications will 
be apparent to and can be readily made by those skilled 
in the art without departing from the scope and spirit of 
the present invention. Accordingly, it is not intended 
that the scope of the claims appended hereto be limited 40 
to the description as set forth herein, but rather that the 
claims be construed as encompassing all the features of 
patentable novelty that reside in the present invention, 
including all features that would be treated as equiva- 
lents thereof by those skilled in the art to which the 
present invention pertains. 

What is claimed is: 

1. An analyzing system for analyzing object signals, 
comprising voice signals, by estimating a generation 5Q 
likelihood of an observation vector sequence being a 
time series of feature vectors X (=xj, . . . , T is a 
total number of frames) with use of a Markov model 
having a plurality of states i (i= 1, . . . , N; N is a total 
number of states) and given transition probabilities from 35 
state i to state j (i, j = 1, . . . , N), comprising: 
feature extraction means for converting the object 

signals into the time series of feature vectors X; 
a state designation means for determining a state i at 

a time t stochastically using said Markov model; 60 
a plurality of predictors each of which is composed of 
a neural network and is provided per each state of 
said Markov model for generating a predictional 
vector gXO of said feature vector X/in said state i at 
the time t based on values of the feature vectors 65 
other than said feature vector x,; 
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a first calculation means for calculating an error vec- 
tor of said predictional vector g,<t) to said feature 
vector xr, and 

a second calculation means for calculating a genera- 
tion likelihood of said error vector using a prede- 
termined probability distribution of the error vec- 
tor according to which said error vector is gener- 
ated. 

2. The analyzing system as claimed in claim 1 in 
which said predictor is comprised of a neural network 
having an input layer, at least one hidden layer and an 
output layer, said neural network being constructed so 
as to output a predictional vector of a feature vector 
when other feature vectors are input to said input layer. 

3. The analyzing system as claimed in claim 2 in 
which each respective layer of said neural network is 
made up of units having a non-linear input to output 
characteristic. 

4. The analyzing system as claimed in claim 1 wherein 
parameters used for defining said Markov model and 
respective predictors are determined based on a param- 
eter estimation learning sequence wherein a set of train- 
ing observation vector sequences are provided and 
renewal of said parameters is repeated until a generation 
likelihood for said training observation vector sequen- 
ces estimated by the analyzing system becomes a maxi- 
mum. 

5. A recognition system for recognizing object signals 
comprising voice signals, comprising: 

a plurality of analyzing apparatuses each for estimat- 
ing a generation likelihood of an observation vec- 
tor sequence being a time series of feature vectors 
X (=X/, , . . , X/; t is a total number of frames) with 
use of a Markov model having a plurality of states 
i (i= 1, . . . , N; N is a total number of states) and 
given transition probabilities from state i to state j 
(i,j=l,...,N); 

feature extraction means for converting the object 
signals into the time series of feature vectors X; 

said each of analyzing apparatuses comprising a state 
designation means for determining a state i at a time 
t stochastically using said Markov model, a plural- 
ity of predictors each of which is composed of a 
neural network and is provided per each state of 
said Markov model for generating a predictional 
vector g/(t) of said feature vector x/in said state i at 
the time t based on values of the feature vectors 
other than said feature vector x f , a first calculation 
means for calculating an error vector of said pre- 
dictional vector g/(t) to said feature vector x r , and a 
second calculation means for calculating a genera- 
tion likelihood of said error vector using a prede- 
termined probability distribution of the error vec- 
tor according to which said error vector is gener- 
ated, and being adapted for a category for an obser- 
vation vector sequence to be categorized; 

a maximum likelihood detection means which com- 
pares likelihoods obtained by said plurality of ana- 
lyzing apparatuses and detects a maximum value 
among said likelihoods; and 

a decision means for identifying said observation 

vector sequence to the category corresponding to 

one of said plurality of analyzing apparatuses 

which gives the maximum likelihood detected by 

said maximum likelihood detection means. 
• « » • * 
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